A First Attempt to Computing Generic Set Partitions: Delegation to an SQL Query Engine

نویسندگان

  • Frédéric Dumonceaux
  • Guillaume Raschia
  • Marc Gelgon
چکیده

Partitions are a very common and useful way of organizing data, in data engineering and data mining. However, partitions currently lack efficient and generic data management functionalities. This paper proposes advances in the understanding of this problem, as well as elements for solving it. We formulate the task as efficient processing, evaluating and optimizing queries over set partitions, in the setting of relational databases. However, producing universally fast execution plans remains a challenging task, since the underlying relational model has a significant impact on the algebraic definition of the operators and therefore on their implementation in terms of space and time costs. We first demonstrate that there is no trivial relational modeling for managing collections of partitions. We formally motivate a relational encoding and show that one cannot express all the operators of the partition lattice and set-theoretic operations as queries of the relational algebra. We investigate SQL features beyond FO to build optimized queries for partition operators. We provide multiple evidence of the inefficiency of FO queries. Our experimental results enforce this evidence, event accounting for careful SQL query optimization. We claim that there is a strong requirement for the design of a dedicated system to manage set partitions, or at least to supplement an existing data management system, to which both data persistence and query processing could be delegated.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

INDREX: In-database relation extraction

The management of text data has a long-standing history in the human mankind. A particular common task is extracting relations from text. Typically, the user performs this task with two separate systems, a relation extraction system and an SQL-based query engine for analytical tasks. During this iterative analytical workflow, the user must frequently ship data between these systems. Worse, the ...

متن کامل

SQLI-Dagger, a Multilevel Template based Algorithm to Detect and Prevent SQL Injection

SQL injection attacks are often found within the dynamic pages of a web application that exploit the security vulnerability of the database layers of an application. In this attack category a specifically crafted SQL command is entered in the form field of a web application instead of the expected information. SQL injection takes advantages of the design flaws in poorly designed web application...

متن کامل

A Fast and High Throughput SQL Query System for Big Data

Relational data query always plays an important role in data analysis. But how to scale out the traditional SQL query system is a challenging problem. In this paper, we introduce a fast, high throughput and scalable system to perform read-only SQL well with the advantage of NoSQL’s distributed architecture. We adopt HBase as the storage layer and design a distributed query engine (DQE) collabor...

متن کامل

k-Efficient partitions of graphs

A set $S = {u_1,u_2, ldots, u_t}$ of vertices of $G$ is an efficientdominating set if every vertex of $G$ is dominated exactly once by thevertices of $S$. Letting $U_i$ denote the set of vertices dominated by $u_i$%, we note that ${U_1, U_2, ldots U_t}$ is a partition of the vertex setof $G$ and that each $U_i$ contains the vertex $u_i$ and all the vertices atdistance~1 from it in $G$. In this ...

متن کامل

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014